Parallel vs. Sequential Belief Propagation 
Decoding of LDPC Codes over GF{q) 
and Markov Sources 



Nadav Yacov* Hadar Efraim*, Haggai Kfir^, Ido Kanter* and Ori Shental"'"^ 



O ■ Abstract 

o _ 

CN I A sequential updating scheme (SUS) for belief propagation (BP) decoding of LDPC 

codes over Galois fields, GF{q), and correlated Markov sources is proposed, and corn- 
ed I pared with the standard parallel updating scheme (PUS). A thorough experimental 
study of various transmission settings indicates that the convergence rate, in iterations, 
of the BP algorithm (and subsequently its complexity) for the SUS is about one half 
of that for the PUS, independent of the finite field size q. Moreover, this 1/2 factor 
appears regardless of the correlations of the source and the channel's noise model, while 
the error correction performance remains unchanged. These results may imply on the 
'universality' of the one half convergence speed-up of SUS decoding. 

O ■ Index Terms: LDPC codes over GF{q), Markov sources, belief propagation, joint 

source-channel decoding, sequential updating. 
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1 Introduction 



Low density parity check (LDPC) codes, first invented by Gallager in 1962 [1] and long after- 
wards rediscovered in the seminal work of MacKay and Neal (MN, [2]), play a fundamental 
role in modern communications, primarily due to their near-Shannon limit performance. 
An almost optimal, yet tractable decoding [2] of this class of codes is empowered by the 
renowned probabilistic message passing algorithm of belief propagation (BP, [3]). 

It was also shown [4] that the remarkable error performance of Gallager's binary LDPC 
codes can be significantly enhanced even further by a generalization to higher finite Galois 
fields, GF{q) with Z 3 q > 2. This behavior can be rationalized by the fact that the 
graphical representation of such codes has less edges and subsequently relatively longer 
loops. 

As BP is an iterative algorithm, its convergence rate is also a crucial benchmark in its 
implementation as a decoder. In principle, as the communication channel becomes noisier 
the decoding time increases, since the latter is dictated by the total number of BP message 
passing iterations required for convergence [5] . Furthermore, this number of iterations tends 
to diverge when approaching the channel's capacity [6] . 

Kfir and Kanter [7] were the first to introduce a serialization method which was shown, 
by providing convincing empirical results, to yield half the decoding time and complexity 
w.r.t. parallel (flooding) scheduling, while the error performance does not deteriorate. 
Following, several sequential (serial) BP message passing schedules were recently introduced 
{e.g. , [8] and references therein). It was shown, either via semi-analytical methods [8-10] 
or by simulations [11], that such sequential schedules converge faster than the standard 
parallel schedule. 

Despite the aforementioned contributions addressing binary LDPC codes, there has 
been no examination of the effect of serialization on LDPC codes over GF{q). In this letter, 
based on a thorough experimental study of a GF{q) extension of the serialization scheme 
originally proposed by Kfir and Kanter [7], we find that not only sequential decoding over 
GF{q) accelerates BP convergence w.r.t. standard flooding, but interestingly the same 1/2 
convergence ratio arises. The error correction performance is roughly preserved. 

In addition, the convergence of sequential decoding is investigated for correlated infor- 
mation sources, an issue yet to be discussed in the literature, addressing only independent 
and identically distributed (i.i.d.) sources. Here, the source is modelled by a Markov pro- 
cess, while a method of dynamical block priors [12] is incorporated within the BP decoding 
in order to exploit the prior knowledge on the source statistics and form a joint source- 
channel decoding scheme (the move to GF{q) enables the treatment of Markov sequences 
with a richer alphabet). Again, the same factor 1/2 emerges in this case too, accelerating 
the convergence of sequential BP decoding substantially. These results are corroborated 
via simulations for the binary symmetric channel (BSC), binary erasure channel (BEG) and 
the binary-input additive white Gaussian noise (BI-AWGN) channel. 

To sum up, the one half factor is found to be robust to the following extensions: 1) 
extension of binary sources, g = 2, to higher finite field, q > 2, 2) extension of BP for i.i.d. 

sources to the case of BP with dynamical priors used for joint source-channel decoding, and 
3) extension of the BSC case to other popular channel models. This extension may imply 
on the 'universality' of the 1/2 convergence speed-up ratio of sequential BP decoding. 
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2 Sequential and Parallel Joint Source-Channel Decoding 



Consider a MN code with two sparse matrices known both to the transmitter and the 
receiver, A(M x N) and B(M x M), where the indices N,M are the source block length 
and the transmitted block length, respectively. All non-zero elements in A and B are taken 
randomly from {1, 2, . . . , g — 1} G GF{q), and B must be invertible. 

A proper construction of these matrices is crucial in order to ensure capacity-achieving 

performance. In this work we follow the Kanter-Saad (KS, [6]) construction, which yields 
very sparse, simple to construct matrices, known to perform very close to the bound. 

Thus encoding a source vector s into a codeword t, with rate R = N/M, is performed 

by 

t = B-^As mod q, (1) 

where t is converted to binary representation and transmitted over the channel. During 
transmission, the coded information t is corrupted by a noise vector n, resulting in the 
received vector r = t + n mod 2.^ 

Upon receipt, the decoder reconverts r back to the original field and computes the 
syndrome z = Br, which can be reformulated as 

z = B(B-^ As + n) = [AB]x = Hx mod q, (2) 

where the operator [•] denotes appending of matrices, and vector x is a concatenation of s 
and n. The decoding problem is solved efficiently using the BP algorithm, as follows. 

The non-zero elements in a row i of the matrix H represent the bits of vector x par- 
ticipating in the corresponding check, Zj. The non-zero elements in column j represent the 
checks in which the j'th bit participates. For each non-zero element in H, the algorithm 
calculates different types of coefficients. 

The coefficient g?- stands for the probability that the bit Xj is a G GF{q), taking into 
account the information of all checks in which it participates, except for the i'th check. 
The coefficient r?- indicates the probability that the bit Xj is a, taking into account the 
information of all bits participating in the ?'th check, except for the j'th bit. 

The parallel updating scheme (PUS) consists of alternating horizontal and vertical passes 
over the H matrix. Each pair of horizontal and vertical passes is defined as an iteration. In 
the horizontal pass, all the rfj coefficients are updated, row after row, by 



{all configurations with xj =a, satis fing Zi) j'j^j 



where it is clear that the multiplication is performed only over the non-zero elements of the 
matrix H. 

In the vertical pass, all qfj are computed, column by column, using the updated values 
of rfj 

<ltj = <^ijPjIl'-i'r (4) 

i'j^i 

where aij is a normalization factor such that J2l.=i Qij = 1' pj represents the prior 
knowledge about bit j being in state a. Now the pseudo-posterior probability can be 
computed by 

= ajp^Hrfj. (5) 



^Hereinafter, for exposition purposes, a BSC with flip rate / is assumed, although an extension to other 
channel models is straightforward and results for such are addressed in the following. 
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Again, aj is a normalization constant satisfying Yla=i Qj ~ 1' ^^'^ ^ runs only over non-zero 
elements of H. Each iteration ends by generating the estimates x by clipping the Qj's. 

At the end of each iteration a convergence test, checking if x solves Hx = z, is performed. 
If some of the M equations are violated, the algorithm turns to the next iteration until a 
pre-defined maximal number of iterations is reached with no convergence. Note that there 
is no inter-iteration information exchange between the bits: all r?- values are updated using 
the previous iteration data. 

In the proposed sequential updating scheme (SUS), we perform the horizontal and ver- 
tical passes separately for each bit in x. A single sequential iteration for the bit xj consists 
of the following steps: 

1. For a given j all r?- are updated. More precisely, for all non-zero elements in column 
j of H, use Eq. Q for updating Vij. Note that this is only a partial horizontal pass, 
since only rf^'s belonging to a specific column are updated. 

2. After all rf^s belonging to a column j are updated, a vertical pass as defined in Eq. (jlj) 
is performed over column j. Again, this is a partial vertical pass, referring only to 
one column. 

3. Steps 1-2 are repeated for the next column, until all columns in H are updated. 

4. Finally, the pseudo-posterior probability value Qj, is computed by Eq. ©. 

After all variable nodes are updated, the algorithm continues as for the parallel scheme: 
clipping, checking the validity of the M equations and proceeding to the next iteration. 

As for the priors, side information on the Markovian nature of the source is dynamically 
incorporated within the decoding process. During the vertical pass, a prior knowledge is 
assigned to each decoded symbol according to the assumed statistics (evidently, for the i.i.d. 
case this would simply be Pr(s = a) = 1/q for all the source symbols.) 

The key here is that one can re-estimate and re-assign these priors after every iteration. 
Consider, for instance, three successive GF{q) source symbols, Sj_i = a, Sj = 6 and Sj+i = c. 
The prior Pr{si = h) can be achieved by the formula^ 

q-l 

/^''= 5] Qti-^a6-Mb,.Q^+i, (6) 

a,c=0 

where M is the measured Markov transition matrix. The complexity of this updating rule 
is 0{q^). Reducing it to 0{q), the Eq. © can be rewritten as 

q-l q-l 

Pf = Y,QU- Mat Yl Mbc ■ QUi . (7) 

a=0 c=0 

As for the noise bit (j > N), the coefficient is initialized, for instance in the BSC case, 
to be Qj = f^{l — /)'°S2'?-i^ where L is the number of I's in the bits presentation of the 
symbol a. Then we set qfj = Qj for all non-zero elements in the j'th column. 

Note that the complexity per single iteration is almost the same for both updating 
methods. Hence, the gain in iterations actually represents the gain in decoding complexity. 

^This prior updating equation is slightly different then the one originally suggested by Kfir and Kanter ( 
[12], Eq. (11)). We find this equation empirically better. 
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3 Results 



In order to evaluate the nature of BP convergence, our experimental study consists of 
decoding various LDPC codes over GF{q) for both correlated and uncorrelated information 
sources. We perform simulations of decoding over the popular BSC, BEC and the BI- 
AWGN channel, using various noise levels (i.e. , flip rates, erasure rates and signal to noise 
ratios, repectively) . The noise levels are chosen to be close to the noise threshold of the 
code in order to get substantial convergence time. The KS code construction is used with 
N = 10000/ log2 q source symbols. 

Table ^ compares the average convergence and error rates for the SUS and PUS. A 
LDPC code of rate 1/3 is used, although similar results are obtained also for other code 
rates, and the statistics is collected over at least 6000 different samples. The correlated 
sources are modeled by adopting typical 2-state (2S) and 4-state (4S) Markov processes 
with entropy 1/2.^ A convergence speed-up factor of 1/2, in favor of SUS, appears. This 
factor is consistent regardless of the GF order and the source correlation. The standard 
deviation is relatively small for all cases. Note that this speed-up does not deteriorate the 
code's error performance. As our statistics is collected over ~ 6000 samples of block size 
10^, we do not report the exact value for bit error rate (BER) < 10~®. 

Fig. ^ presents the ratio between PUS and SUS in the percentage of corrected bits per 
iteration as a function of the total percentage of correct bits. To be more precise, the 
correction gain, i.e. the difference between the percentage (w.r.t. the block's size) of bits 
corrected in the following iteration and in the current iteration, AP, is calculated and the 
average ratio for SUS and PUS 

< APpus > , . 

<APsus>' 

is drawn as a function of the percentage of the current correct bits P. 

It can be easily seen that the one half ratio is preserved through all the process of 
decoding, regardless of the decoding dynamics and the system's state. These results are 
obtained for transmitting a 2S-Markov source via a BSC (/ = 0.23) using a GF{8) rate 1/3 
LDPC code, although similar results are found for the BEC and BI-AWGN channel cases, 
using all other investigated transmission settings, as listed in Table ^Notice that the last 
point in Fig. ^ indicating the end of the decoding process, is higher (around 0.7) since at 
this final stage the error correction improvement by SUS can not be twice that achieved for 
PUS. 
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Table 1: SUS vs. PUS: convergence and error rates 



Channel 


GF 


Source 


Noise 


< tsus > 

{iterations) 


< tpus > 

{iterations) 


<tsus> 

<tpus> 


^ tsus ^ 
tpuS 


STDEV(*sus) 


BER(xlO-6) 
SUS, PUS 




2 


I.I.D. 


/ = 0.155 


25.06 


50.73 


0.494 


0.541 


0.080 


5.28 , 5.50 


BSC 


4 


Markov 4S 


/ = 0.227 


28.22 


55.66 


0.507 


0.513 


0.055 


1.54 , 1.53 




8 


Markov 2S 


/ = 0.230 


25.57 


50.04 


0.511 


0.518 


0.073 


1.74 , 1.74 




2 


I.I.D. 


e = 0.58 


17.60 


35.07 


0.502 


0.503 


0.051 


< 1 , < 1 


BEC 


4 


Markov 4S 


e = 0.78 


22.98 


44.54 


0.516 


0.527 


0.070 


1.9 , 2.0 




8 


Markov 2S 


e = 0.76 


20.02 


40.70 


0.492 


0.498 


0.063 


< 1 , < 1 




2 


I.I.D. 


a = 0.98 


20.77 


42.64 


0.487 


0.497 


0.026 


< 1 , < 1 


BI-AWGN 


4 


Markov 4S 


(7 = 1.64 


19.27 


37.28 


0.517 


0.534 


0.087 


6.2 , 6.0 




8 


Markov 2S 


a = 1.66 


16.55 


32.14 


0.515 


0.511 


0.058 


2.6 , 2.5 



(0 
0} 



(D 
Q. 



re 



o 

O 

o 



0. 
0. 

coo. 

go- 

50. 



0.1 




0.77 



0.82 



0.87 0.92 
% Correct Bits 



0.97 



Figure 1: The relative improvement in correct bits for PUS/SUS as a function of the 
current percentage of correct bits. Using BSC, N = 3333 (9999 bits), i? = 1/3, / = 0.23, 
GF{8), 10000 samples and a 2-S Markov source. 
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